Performance Analysis of Runtime Data Declustering over SAN-Connected PC Cluster
نویسندگان
چکیده
Personal computer/workstation (PC/WS) clusters have come to be studied intensively in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system in the next generation, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data intensive applications including data mining and ad-hoc query processing in databases are considered very important for high performance computing, in addition to the conventional scientific calculation. Thus, investigating the feasibility of such applications on a PC cluster is meaningful. In this paper, a PC cluster connected with Storage Area Network(SAN) is built and evaluated. In the case of SAN cluster, each node can access all shared disks directly without using LAN; thus, SAN clusters achieve much better performance than LAN clusters for disk access operations. However, if a lot of nodes access the same shared disk simultaneously, application performance degrades due to I/O-bottleneck. A runtime data declustering method, in which data is declustered to several other disks dynamically during the execution of application, is proposed to resolve this problem. Parallel data mining is implemented and evaluated on the SAN-connected PC cluster. This application requires iterative scans of a shared disk, which degrade execution performance severely due to I/O-bottleneck. The runtime data declustering method is applied and characteristics of the system such as I/O and network operations are evaluated in detail. According to the results of experiments, the proposed method prevents performance degradation caused by shared disk bottleneck in SAN clusters.
منابع مشابه
Runtime Data Declustering based on Bandwidth-on-Demand and its Evaluation over SAN-connected PC cluster
Clusters of computers are used in large scale server sites recently, because of their good scalability and cost/performance ratio. In addition, Storage Area Network (SAN) is introduced in order to consolidate back end of such systems. I/O-bottleneck is serious problem in such an environment, because some important data-intensive applications often access part of data concurrently and repeatedly...
متن کاملRuntime Data Declustering over SAN-Connected PC Cluster System
Recently, personal computer/workstation (PC/WS) clusters have come to be studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications including data mining and ad-hoc query processing in databases are considered very important for massively parallel processors, in addition to the conventional scientific calculation. Thus, ...
متن کاملData mining on PC cluster connected with storage area network: its preliminary experimental results
Personal computer/Workstation (PC/WS) clusters have become a hot research topic recently in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data inte...
متن کاملRun-Time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource - A Case Study of Data Mining Application
PC cluster system is an attractive platform for data-intensive applications. But the conventional shared-nothing system has a limit on load balancing performance and it is difficult to change the number of nodes and disks dynamically during execution. In this paper, we develop dynamic resource injection, where the system can inject CPU power and expand I/O bandwidth by adding nodes and disks dy...
متن کاملImplementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments
Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002